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1. INTRODUCTION 

Rapid advancement in computer graphic (CG) image rendering techniques give birth to applications 
such as animations, cartoons, gaming, photo-realism, virtual reality and many more [1]. Sophisticated CG 
software tools allow the user to produce synthetic images which are close to the reality and are difficult to 
identify whether an image is a camera captured or computer generated [2], [3]. If such images are used 
illegally in the court-of-law, journalism, criminal investigation, and political propaganda, then it may cause 
serious threat to the society [4]. In such cases, verifying the authenticity of images is a big challenge in 
digital image forensics. 

To classify photo-realistic computer graphics (PRCG) from photographic images (PG), three 
benchmark datasets as shown in Table 1 are used in the literature: 1) Columbia Photographic Images and 
Photorealistic Computer Graphics Dataset—Columbia Dataset; ii) DSTok dataset; and iii) A dataset created by 
Rahmouni ef al., to evaluate the performance of classification models. The DSTok dataset is the largest 
dataset in the literature with 9,700 samples. The aforementioned datasets are lacking with diversified image 
contents, sample size and more importantly the images in these datasets were produced/captured using older 


Journal homepage: http://ijai.iaescore.com 


138 0 ISSN: 2252-8938 


versions of CG software’s/camera models. Advancement in digital image rendering techniques have made it 
easy for the users to capture high quality images with regard to photographs and to produce photograph-like 
images which cannot be compared with the graphical contents produced using older versions of CG image 
rendering techniques. Hence, it is needed to upgrade datasets as well to evaluate updated innovations in the 
field of CG and PG image classification. 


Table 1. Comparison of existing CG and PG image datasets. 


Dataset Year CG PG Images Dataset Publicly Limitations 
Images Size Available? 
Columbia 2004 800 Personal: 800 3,200 Yes Small dataset and images 
Dataset [5] Google: 800 are restricted to few 
Recaptured PRCG: 800 categories. 
DSTok 2013 4,850 4,850 9,700 No PRCG images in CG image 
Dataset [6] class are relatively small. 
Rahmouni 2017 1,800 1,800 3,600 Yes Small dataset and CG image 
et al. [7] class consist of only video 


game screenshots. 


In this paper, we propose two new datasets, namely “JSSSTU CG and PG image dataset”and 
“JSSSTU PRCG image dataset’. Initially the dataset is created with the intention of having diversified image 
contents with respect to CG and PG image categories and comprises 14,000 samples. Later dataset is created 
with the intention of having only photo-realistic computer graphics which are hard to distinguish with naked 
eyes that consist of 2,000 samples. Our new datasets would become very challenging and will be helpful for 
the researchers to develop efficient or improved classification models to produce better results who are 
working on the cutting-edge research problem: “classification of computer graphic images and photographic 
images”. Researchers have addressed this problem in different perspectives based on conventional machine 
learning and deep learning approaches. 


a. Conventional machine learning 

Significant improvement has been made in recent years to classify CG and PG images. Existing 
conventional machine learning techniques can be grouped into three categories based on the features selected 
for classification. They are: i) camera- characteristic based approaches [4], [8]-[12]; ii) spatial feature based 
approaches [13]-[20], and iii) geometric feature based approaches [21 ]-[28]. 
—  Camera-characteristic based approaches 

Techniques used to generate CG and PG images, undergo different pipeline architectures. Since PG 
images are acquired using digital cameras, they must exhibit distinct intrinsic properties which are not 
present in CG images. Based on this fact, some identification approaches have been described in [8]—[10]. 
Dehnie et al. [8] employed pattern noise caused due to the defect in camera sensors for classification of CG 
and PG images. Dirik et al. [9] proposed the features to detect the traces of color filter array (CFA) and 
chromatic aberration to distinguish CG and PG images. Khanna et al. [10] described a method based on 
residual pattern noise to distinguish scanner, CG and PG images. Photo response non uniformity (PRNU) 
noise is used as a digital fingerprint to identify the source camera in digital forensics and this is exploited in 
[4], [11], [12]. Peng et al. [11] proposed a method based on the theory of multifractal spectrum and features 
of PRNU, multifractal spectrum features of PRNU are extracted from an image to distinguish PRCG and PG 
images. Peng and Zhou [4] examine the changes in PRNU correlations and histogram features extracted from 
variance histograms of PRNU are used for identification of CG and PG images. Long et al. [12] proposed a 
method based on binary measures computed from PRNU noise in RGB channels to depict the differences 
between CG and PG images. 
— Spatial feature based approaches 

Pan et al. [13] show that the perceptual difference between CG and PG images is generally present 
in color and coarseness. Former is represented using fractal dimensions and the latter is described using 
generalised dimensions. Wu et al. [14] compute the difference in histogram of images and some higher 
histogram bins are considered as features to perform classification. Local binary pattern (LBP) [15] is a 
texture descriptor majorly used in image texture analysis and this is employed in [16], [17] to classify CG 
and PG images. Peng et al. [18] proposed a method based on statistical and textural features. Tan et al. [19] 
presented a novel scheme using local ternary count (LTC) which produces 54 dimensions of features from 
normalized histograms. Peng et al. [20] proposed a hybrid feature by analysing the differences in textures of 
residuals of CG and PG images. 
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— Geometric feature based approaches 

Wang and Moulin [21] used wavelet-coefficients histogram as features extracted from wavelet- 
based statistical model for discrimination of CG and PG images. Chen ef al. [22] built an alpha-stable model 
to describe wavelet decomposition coefficients of PG images. Wavelet domain is used to extract fractional 
lower order moments in images. Zhang and Wang [23] found that imaging features and visual features for 
images produced using different image acquisition processes reveal different statistical regularities in the 
wavelet domain. Based on this principle, statistical features and cross correlation of wavelet coefficients are 
used as features extracted from each sub-bands. Guo and Wang [24] presented a method based on 
multiwavelets which extracts the features in wavelet subbands. Fan et al. [25] proposed a scheme based on 
modified image contour transform in HSV color space to classify CG and PG images. Statistics such as 
average value, variance, skewness and kurtosis are computed in the wavelet domain. Birajdar and Mankar 
[26] used discrete wavelet transform to extract binary statistical image features by decomposing an image 
into subbands. Then the fuzzy entropy measure is employed to select relevant features. Quaternion wavelet 
transform is presented in [27] and [28], which extracts statistical features to classify CG and PG. 


b. Deep learning approaches 

Rahmouni et al. [7] proposed a novel scheme which combines statistical feature extraction to a 
convolutional neural network (CNN) architecture, then class label of the entire image is predicted by using 
weighted voting scheme which aggregates the local estimates of the class probabilities. Nguyen et al. [29] 
customized VGG-19 architecture to extract the generic features in the first three convolutional layers, and 
then statistical pooling layer is constructed as proposed in [7]. Pre-trained CNN models are employed in 
[30]—[33], and fine-tuned through transfer learning for binary classification. Chawla et al. [34] proposed five 
layers CNN architecture by introducing a special layer which takes some prediction error filters onto the first 
convolutional layer to ensure the correlation between pixels in PG and CG images. To predict the outcome of 
the original picture, two methods are used namely, weighted voting scheme and majority voting scheme. 
Former is used to label the image by aggregating the class probabilities and latter is used where the label is 
considered that appear in the majority of the image patches. Yao et al. [35] employed three sorts of high-pass 
filters to extract sensor noise residuals then piped into the proposed five layers CNN framework. Quan et al. 
[36] proposed a new CNN framework with two CNN cascaded convolutional layers at the end of the 
network. He et al. [37] described a novel deep learning approach by combining CNN and recurrent neural 
network (RNN). Thereafter, He et al. [38] proposed an attention-based dual-branch CNN to extract the 
features from combined color components. Meena and Tyagi [39] proposed an ensemble model by 
combining the features produced by VGG-19 pre-trained CNN and noise features produced using high-pass 
filters to discriminate CG and PG images. 

From the above study, even though much progress has been made for the classification of CG and 
PG images, existing techniques and datasets used to evaluate the performance is still have the following 
limitations: i) in the existing datasets, sample size and image contents are limited and do not make 
compelling high quality image content due to advancement in image rendering techniques; (ii) in prior works, 
accuracy of the classification model depends on choice of the feature descriptor used for classification. 

In the task of distinguishing CG and PG images, the contributions of the paper are outlined as: 

— Due to non-availability of large, heterogeneous dataset containing CG and PG images, ‘JSSSTU CG 
and PG image dataset’ is created. ‘JSSSTU PRCG image dataset’ is created which exhibits high photo- 
realism. 

— Effectiveness of the existing texture based feature descriptors and CNN based deep learning techniques 
are investigated on our new datasets and benchmark datasets. 

Remainder of this paper is organised as follows: section 2 presents a description of the state-of-the- 
art techniques based on conventional machine learning and deep learning. Section 3 demonstrates 
performance of the techniques on our new datasets and benchmark datasets through experimental results. 
Conclusion is given in section 4. 


2. RESEARCH METHOD 

In this section, state-of-the-art conventional machine learning and CNN based deep learning 
techniques are developed for the task of classifying CG and PG images. Handcrafted textural features are 
considered for conventional machine learning and VGG variants CNN based pre-trained models, are used for 
deep learning. They are described in the following sections. 
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2.1. Conventional machine learning techniques 

Texture describes surface characteristics of an image. Most widely used texture descriptors such as 
gray level co-occurrence matrix (GLCM) [40] and LBP [15] are employed to analyse the surface texture of 
CG and PG image. Texture surface of CG image appear smoother than those of PG image, which exhibits the 
basic differences between them. Hence, the aforementioned texture descriptors are used in our work. 


2.1.1. GLCM descriptor 

GLCM is a statistical method which computes the occurrence of pairs of pixels or gray levels in a 
particular orientation over all in an image or image region. This is represented by using parameters (@, d) 
where ‘©’ represents orientation and ‘d’ is the distance between two picture elements. The GLCM descriptor 
allows rotational invariance and it is defined by 8 orientations separated by IT/4 radians. Haralick et al. [40] 
defined 14 statistical properties computed from the normalized GLCM matrix. In this work, we employed 
four properties such as contrast, correlation, energy and homogeneity. 
— Contrast: contrast computes local intensity variation between a picture element and to its neighbor for 

the entire image as given in (2). The range is calculated using the (1). 


Range = [0 (size(GLCM, 1)-1)*2] (1) 


Where, GLCM represents normalized matrix. Imn and variables m and n from (2)-(5) represent (m, n)" 
entry and a value at (m, n) in a normalized GLCM. 


Contrast = >» lm — n|? Imn 
mn (2) 


— Correlation: correlation computes correlation of a picture element to its neighbor over the entire image. 
It returns a value between -1 and Ifor a positively or negatively correlated image. Otherwise, return 
unrepresentative value for a constant image as given in (3). Where, ‘yw’ and ‘o’ indicates mean and 
standard deviation of the marginal distributions associated with Imn/R, and R is a normalized constant. 


m—-w(n- 
Correlation = > ja 
rman ° (3) 


— Energy: energy is also termed as angular second moment which computes the sum of squared elements. 
It returns a value between 0 and 1, otherwise, returns | for a constant image as given in (4). 


Energy = > (Imn)* 
2 (4) 


— Homogeneity: Homogeneity computes the closeness of elements diagonally in GLCM. It returns 1 for 
diagonal elements, otherwise, returns a value between 0 and | as given in (5). 


linn 
Homogeniety = —_—— 
fake |Im—n| (5) 
2.1.2. LBP descriptor 
LBP is a texture descriptor operator proposed by Ojala [15], which encodes each pixel value of an 
image by comparing its neighborhood pixels with the center pixel value. If the intensity of neighboring pixel 
is greater than or equal to the intensity of center pixel mark the neighboring pixel as 1, otherwise, mark as 0 
which result in a binary sequence. Then, a bit vector is converted into decimal number and is replaced with 
center pixel value. LBP descriptor of every pixel in an image is computed using (6) and f(s) is given in. (7). 


N-1 
LBP(N,R) = > f (In — 1-) 2% 
ys (6) 


ifs20 
Otherwise. (7) 


f09) = {9 
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Where, I, and I, indicate intensity of neighboring and current pixel respectively. N represents the number of 
neighbors chosen at a radius of R. In this work, we choose P=8 neighbors with a radius R=1. 


2.2. Support vector machines (SVM) classifier 

SVM [41] are the most widely used and effective supervised machine learning algorithm for 
classification problems. It can be used to perform linear and non-linear classification. In this work, we 
perform non-linear classification, when the feature vectors could not be separated linearly. Radial basis 
function (RBF) kernel is chosen in the experimentation and it is described in (8). Where, |lul-u2|| in (8) 
represent euclidean distance between two feature vector points ul and u2 and ‘o’ represent variance. 


||w1 — w2||? 

k(ul, u2) = exp (- po (8) 
2.3. CNN based deep learning techniques 

Among other deep neural networks, CNN based deep learning techniques have shown its 
effectiveness by obtaining general features to specific features automatically based on the image content. 
CNN based pre-trained neural network models such as AlexNet [42], VGG (VGG16 and VGG19) [43], 
GoogLeNet [44], and ResNet [45], have shown great performance in classifying images into 1000 object 
categories such as pencil, keyboard, mouse, etc., and have become standard models for classification tasks. 
These models are trained on millions of images and have learnt rich feature representations for a wide range 
of PG images in the ImageNet database. 

Training a deep ConvNet model from scratch takes several days or weeks or even months on a large 
dataset. A pre-trained neural network model would be a better choice to solve similar kinds of problems for a 
smaller dataset. In this work, two variants of VGG pre-trained neural network models such as VGG16 and 
VGG19 are adopted. Because, these neural network models have fixed size kernels which take less time to 
process and easily capture small patterns. Transfer learning is applied to perform classification on a new 
dataset which contains CG and PG images. It can be carried out in two ways: feature extraction and fine- 
tuning. Latter is adopted in our work and is performed by replacing the last three layers of pre-trained neural 
network models and these layers are fine-tuned for classification of CG images and PG images. 


2.3.1. Visual geometry group (VGG) architecture 

VGG architecture can be viewed as an input layer, feature extraction layers and classification layers. 
In the input layer, a color image of fixed size 227x227 is input to the architecture during training. The image 
is pre-processed by subtracting the mean RGB value from each pixel on the training set. During feature 
extraction, the image is moved through a series of convolutional layers, where the filters of fixed size 3x3 are 
used. Spatial padding and stride is fixed to | pixel which preserves the spatial dimension after convolution. 
The depth of the convolutional layers begins from 64 in the first layer and increases by a factor of 2 after 
every maximum pooling layer until it attains 512. Spatial dimension of the image is reduced by maximum 
pooling layers and this is done by using a filter of size 2 and a stride of 2. Five maximum pooling layers are 
used in the architecture which follows some convolutional layers. Classification layers consist of three fully 
connected layers: first two have 1024 neurons each and third contain two neurons and at the end sigmoid 
activation function is used to perform binary classification which produces the value in the range 0 and land 
it is described in (9). VGG variants CNN architecture is presented in Table 2 (the parameters of 
convolutional layers are denoted as conv(block)-(number of filters)_ layer number at each block. ReLU is not 
shown for brevity). 


1 


SO = Them 9) 


For training, binary cross entropy loss function is used and is given in (10). 


. ly 
Binary cross entropy = -—) —(yj X log(p;) + (1 — y;) x log(1 — p;) 
jal (10) 


Where, M is the number of categories, (1 — p;) is the probability of class CG. pjis the probability of class 
PG and y is the binary indicator (0 or 1) if category label is the correct classification for sample. Rectified 
linear unit (ReLU), a non-linear activation function, i.e. f(x)=max(0, x) is used in all hidden layers of VGG 
variants. 
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Table 2. VGG variant CNN configuration: output volume and parameters for VGG16 and VGG19 
architecture 
VGG variant CNN configuration 
VGG16 Output volume Parameters VGG19 Output volume Parameters 
Image input (227x227 color image) 
conv1-64_1 227x227x64 1792 conv1-64_1 227x227x64 1792 
conv1-64_2 227x227x64 36928 conv1-64_2 227x227x64 36928 
Maximum pooling 
conv2-128_1 113x113x128 73856 conv2-128_1 113x113x128 73856 
conv2-128 2 113x113x128 147584 conv2-128 2 113x113x128 147584 
Maximum pooling 
conv3-256_1 56x56x256 295168 conv3-256_1 56x56x256 295168 
conv3-256_2 56x56x256 590080 conv3-256_2 56x56x256 590080 
conv3-256_3 56x56x256 590080 conv3-256_3 56x56x256 590080 
conv3-256_4 56x56x256 590080 
Maximum pooling 
conv4-512_1 28x28x512 1180160 conv4-512_1 28x28x512 1180160 
conv4-512_2 28x28x512 2359808 conv4-512_2 28x28x5 12 2359808 
conv4-512_3 28x28x512 2359808 conv4-512_3 28x28x512 2359808 
conv4-512_4 28x28x512 2359808 
Maximum pooling 
conv5-512_1 14x14x512 2359808 conv5-512_1 14x14x512 2359808 
conv5-512_2 14x14x512 2359808 conv5-512_2 14x14x512 2359808 
conv5-512_3 14x14x512 2359808 conv5-512_3 14x14x512 2359808 
conv5-512_4 14x14x512 2359808 
Maximum pooling 
FC — 1024 
FC — 1024 
FC-1 
Sigmoid 
Trainable parameters - 41,343,873 - - 46,653,569 
(in millions) 
3. RESULTS AND DISCUSSION 
3.1. Dataset collection 


‘JSSSTU CG and PG image dataset’ consists of image categories: CG and PG images with 7,000 
samples in each class, containing diversified contents. CG images are collected from various reliable 
computer graphics websites and PG images are captured from different camera models (standalone, in-built 
mobile cameras) as the camera specifications for each model vary in terms of megapixel count, image 
quality, sensor type and so on. To improve the diversity of PG image contents, they are collected from other 
sources INRIA [46], ICCV09 [47], and McGill calibrated colour image database [48]. Contents of the CG 
image class include 3D model, architecture, cartoon, digital art, non-PRCG images, object, people, PRCG 
images, texture, trademark, vector maps and video gaming. PG image class cover a wide range of image 
contents: animals, buildings, man-made objects, indoor scenes, outdoor scenes, nature, vehicles and so on. 
‘JSSSTU PRCG image dataset’ contains 2,000 samples which demonstrate high photo-realism. 

Online sources used to create CG image dataset are given in Table 3. Camera models and other 
sources used to create PG image dataset are given in Tables 4 and 5. Various online sources used to create 
PRCG image dataset are given in Table 6. The aforementioned datasets are made publicly available to the 
research community at the following link: = https://sites.google.com/view/hrchennamma/research- 
activities/jssstu-data-sets. 


Table 3. Online sources used to create CG image dataset 


Image class 


Online sources 


Computer graphic images 


https://pixbay.com 
http://wallpaperlepi.com 
http://www.cgw.com 
http://www.cgsociety.org 
https://free3d.com 
http://fantasyartdesign.com 


https://www.grabcad.com 
https://freestocktextures.com 
http://www.acitymap.com 
http://www.cadnav.com 
https://www.nexusmods.com 
https://wallpaper.mob.org 


http://www.3dlinks.com 

http://www.realsoft.com 

www.digitalrepose.com 
www.google.com/imghp?hl=EN 


In addition to these datasets, existing benchmark datasets presented in [5]-[7], are used for the 
experimentation. Sample size pertaining to each class and each dataset are shown in Table 7. Image samples 
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from JSSSTU CG and PG image dataset and JSSSTU PRCG image dataset is shown in Figures 1(a)-(c) 
respectively. 

All the images are resized to a dimension of 227x227 pixels. Textual information present in some of 
the computer graphic images is cropped. Datasets are randomly partitioned into 80% for training (70% for 
training and 10% for validation in case of pre-trained neural network model) and 20% for testing. 


Table 4. Camera models used to create PG image dataset 


Image class Camera models 
Photographic Canon PowerShot A2200, NIKON D7100, NokiaC6-01, Canon PowerShot SX200 IS, Vivo 1714, 
images Canon PowerShot SD1100 IS, SAMSUNG GT-S7262, Lenovo K50a40, SONY DSC-WX7, Vivo 1718, 


Canon PowerShot A400, Canon EOS 1100D, Canon EOS 1000D. 


Table 5. PG image sources used to create PG image dataset 
PG image source Count 
Personal collection 4,019 


INRIA 1,280 
ICCV09 622 
McGill 1,079 

Total 7,000 


Table 6. Online sources used to create JSSSTU PRCG image dataset 


Image class Online sources 

PRCG images _https://www.3dartistonline.com __https://www.gamespot.com https://www.chaosgroup.com 
https://3dexport.com https://gizmodo.com https://app.easyrender.com 
https://archicgi.com http://www.graphicmania.net https://evermotion.org 
https://archvizcamp.com https://lumion.com https://www.freepik.com 
https://area.autodesk.com https://www.maxon.net https://www.ronenbekerman.com 
https://www.blenderguru.com http://www.nextlimit.com https://www.thearender.com 
https://www.cgmeetup.com https://www.blog.poliigon.com _https://forums.unrealengine.com 
https://www.cgtrader.com https://www.vizpark.com http://www.3dlinks.com 


Non-PRCOG 
image 


(b) 


Figure 1. Image samples from JSSSTU CG and PG image dataset and JSSSTU PRCG image dataset 
(a) CG images, (b) PG images, and (c) PRCG images 
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Table 7. Datasets used in the experiment 


Dataset CGimages PGimages Dataset size 
Columbia dataset 800 800 1,600 
DSTok dataset 4,850 4,850 9,700 
Rahmouni et al. dataset 1,800 1,800 3,600 
JSSSTU CG and PG image dataset proposed 7,000 7,000 14,000 
JSSSTU PRCG image dataset proposed 2,000 2,000 4,000 


3.2. Experiments 
3.2.1. Experiment setup for conventional machine learning techniques 

Experiments are conducted using MATLAB R2018a with Intel Core-i3 4005U processor, 1.70 GHz 
and 8 GB RAM. Texture features such as GLCM and LBP are extracted independently from an image which 
consists of a feature dimension 4 and 59 respectively. SVM classifier is used in the experimentation. 


3.2.2. Experiment setup for pre-trained neural network models 
VGG variants (VGG16 and VGG19) are implemented with Google Colaboratory platform on the 
free ‘Tesla K80 GPU’ with 25 GB RAM using Keras. Regularization techniques such as early stopping 

(monitor=validation loss and patience=10), data augmentation and dropout (dropout probability 0.5) [49] are 

used to prevent the pre-trained neural network models from overfitting. Hyper-parameters such as stochastic 

gradient descent (SGD) optimizer with default momentum value of 0.9 and maximum number of epochs 100 

are used for all datasets during training. Other hyper-parameters such as batch size and initial learn rate and 

learn rate drop factor used for different datasets are given: 

— Columbia dataset, DSTok Dataset, Rahmouni et al. Dataset and JSSSTU CG and PG image dataset 
VGG variants (VGG16 and VGG19): batch size of 32 images, an initial learn rate of le-4, learn rate 
drop factor (monitor=validation loss, factor=0.1, and patience=5) are used. 

—  JSSSTU PRCG image dataset 
VGG variants (VGG16 and VGG19): batch size of 16 images, an initial learn rate of le-4 and learn rate 
drop factor (monitor=validation loss, factor=0.1, and patience=5) are used. 

— Data augmentation 
To increase the diversity in content of images, we employed data augmentation techniques such as 
translation, rotation, shear, reflection and zooming. These techniques are used during training of 
VGGI16 and VGG19 pre-trained CNN on JSSSTU CG and PG image dataset, JSSSTU PRCG image 
dataset and DSTok Dataset respectively. Aforementioned random transformations help the model to 
expose to more aspects of data and yield better generalization. 


3.2.3. Experiment results 

Average classification accuracies obtained using handcrafted texture features and pre-trained CNN 
on our new datasets are tabulated in Table 8. As shown in Table 8, CNN based pre-trained techniques 
outperformed the classification accuracy performance against the conventional SVM-based classifier. 
VGG19 has attained better classification results when compared to the handcrafted texture features and 
VGGI16. 


Table 8. Average classification accuracies of handcrafted texture features and pre-trained CNN on our 
new datasets 


Dataset Average classification accuracy in % 
Handcrafted features Pre-trained CNN 
GLCM LBP VGG16 VGG19 
JSSSTU CG and PG image dataset 63.67 85.75 94.42 94.46 
JSSSTU PRCG image dataset 63.50 75.62 89.12 89.37 


3.2.4. Comparative analysis of benchmark datasets used to evaluate classification models 

Existing methods based on conventional machine learning and deep learning are used to compare 
their performances on existing benchmark datasets. Average accuracies attained are listed in Table 9. 
Performance of the methods is given from highest to lowest. As seen from Table 9, VGG19 pre-trained CNN 
has achieved cent percent classification results on Columbia dataset. Feature fusion method based on 
conventional machine learning proposed by Tokuda et al. has obtained better identification accuracy on 
DSTok dataset. Further, techniques presented in [29], [34], [35] have attained cent percent accuracy on a 
Rahmouni et al. dataset. 
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Table 9. Comparative analysis of benchmark datasets used to evaluate classification models 


Columbia dataset DSTok dataset Rahmouni et al. dataset 
Method Average accuracy (%) Method Average accuracy (%) Method Average accuracy (%) 

VGG19 100 Tokudal et al. [6] 97 Yao et al. [35] 100 
VGGI16 99.37 Ming He [30] 96 Chawla et al. [34] 100 
Cui et al. [31] 98 Rezende et al. [33] 94 Nguyen et al. [29] 100 

LBP 97.50 VGG19 90.92 VGG19 95.96 

Fan etal. [25] 93.51 VGGI16 89.94 VGGI16 95.42 
GLCM 78.75 LBP 75.61 Rahmouni et al. [7] 93.2 

GLCM 61.85 LBP 87.22 

GLCM 66.52 


3.2.5. Performance metrics 

The metrics such as precision, recall and f-score [50] are used to assess the performance of VGG19 
pre-trained neural network model as it yields best classification accuracy against handcrafted features and 
VGGI16 pre-trained neural network model on existing and proposed datasets. Macro average of 
aforementioned metrics is computed for two classes. Table 10 shows the evaluation metrics used to assess the 
performance of VGG19 pre-trained neural network model on existing and proposed datasets. 

As seen from Table 10, low f-score is obtained on JSSSTU PRCG image dataset when compared to 
other datasets. The difference in f-score of JSSSTU CG and PG image dataset and DSTok dataset is only 0.3. 
Hence, we conclude that, our new datasets are very challenging and the DSTok dataset is as good as JSSSTU 
CG and PG image dataset but it is lacking with larger dataset size, contain limited number of PRCG images 
and images produced using recent rendering technology. 


Table 10. Performance metrics used to assess the performance of VGG19 pre-trained neural network model 
on different datasets 


Dataset Performance metrics 
Precision Recall __ F-score 
Columbia Dataset 1 1 1 
Rahmouni et al. Dataset 0.96 0.96 0.96 
JSSSTU CG and PG image dataset — Proposed 0.94 0.94 0.94 
DSTok Dataset 0.91 0.91 0.91 
JSSSTU PRCG image dataset — Proposed 0.89 0.89 0.89 


4. CONCLUSION 

This work is aimed at creating two new datasets, namely ‘JSSSTU CG and PG image dataset’ a 
heterogeneous dataset which comprises 14,000 samples and ‘JSSSTU PRCG image dataset’ which exhibits 
photo-realism with 2,000 samples. Further, we implemented state-of-the-art techniques based on handcrafted 
texture features and deep learning. Performance of these techniques is evaluated on our new datasets and 
benchmark datasets. Experimental results show that CNN based pre-trained techniques outperformed the 
classification accuracy performance against the conventional SVM-based classifier. Further, we found that 
the choice of handcrafted features used for classification has achieved better results on the Columbia Dataset 
when compared to other benchmark datasets and our new datasets. The performance of VGG19 pre-trained 
neural network technique has attained significant results on ‘JSSSTU CG and PG image dataset’ but still the 
accuracy can be improved. On the other hand, its performance on ‘JSSSTU PRCG image dataset’ has 
achieved low detection rate due to the high-realism images present in the dataset. Hence, an efficient and 
robust technique is needed to solve this problem and our new datasets will be helpful for the researchers who 
are working on the cutting-edge research problem: “classification of computer graphic images and 
photographic images” to evaluate their classification models. To the best of our knowledge, these kinds of 
datasets do not exist in the literature. 
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